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Abstract 

This paper surveys some well-established approaches on the approximation 



of Bayes factors used in Bayesian model choice, mostly as covered in Chen 



et al. (2000). Our focus here is on methods that are based on importance sam- 
pling strategies — rather than variable dimension techniques like reversible jump 
MCMC — , including: crude Monte Carlo, maximum likelihood based importance 
sampling, bridge and harmonic mean sampling, as well as Chib's method based 
on the exploitation of a functional equality. We demonstrate in this survey how 
these different methods can be efficiently implemented for testing the significance 
of a predictive variable in a probit model. Finally, we compare their performances 
on a real dataset. 
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1 Introduction 



The contribution of Jim Berger to the better understanding of Bayesian testing is 
fundamental and wide-ranging, from establishing the fundamental difficulties with p- 



values in Berger and Sellke (1987) to formalising the intrinsic Bayes factors in Berger 



and Pericchi (1996), to solving the difficulty with improper priors in Berger et al. (1998), 



and beyond! While our contribution in this area is obviously much more limited, we 
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aim at presenting here the most standard approaches to the approximation of Bayes 
factors. 

The Bayes factor indeed is a fundamental procedure that stands at the core of the 



Bayesian theory of testing hypotheses, at least in the approach advocated by both Jef- 



freys (1939) and by Jaynes (2003). (Note that Robert et al. 2009, provides a reassess- 



ment of the crucial role of Jeffreys! 1939| in setting a formal framework for Bayesian 
testing as well as for regular inference.) Given an hypothesis Hq : 9 e Bo on the pa- 
rameter 6 G 6 of a statistical model, with observation y and density f(y\8), under a 
compatible prior of the form 

7r(e o )7r o (0)+7r(eS)7ri(0), 
the Bayes factor is defined as the posterior odds to prior odds ratio, namely 
7r(e |y) /7r(e ) 



An(y) 



f{y\0)« o (0)M / / f{y\ey l {e)de. 



7r(68|y)/ 7r(eg) JQo 
Model choice can be considered from a similar perspective, since, under the Bayesian 



paradigm (see, e.g., Robert 2001), the comparison of models 

where the family 3 can be finite or infinite, leads to posterior probabilities of the models 
under comparison such that 

P (971 = 9Jli\y) oc pi f fiiyieiWejdOi, 

where Pi = P(9Jt = 97tj) is the prior probability of model 97tj. 

In this short survey, we consider some of the most common Monte Carlo solutions 
used to approximate a generic Bayes factor or its fundamental component, the evidence 



rrii 



n{0i)fi{y\0i)Mi 



aka the marginal likelihood. 



e found in 


Carlin and Chib 


(1995) 


Friel and Pettitt 


(2008 


). Note that 



we only briefly mention here trans-dimensional methods issued from the revolutionary 



paper of Green (1995), since our goal is to demonstrate that within- model simulation 
methods allow for the computation of Bayes factors and thus avoids the additional 
complexity involved in trans-dimensional methods. While ameanable to an importance 
sampling technique of sorts, the alternative approach of nested sampling (Skilling 2006) 
is discussed in Chopin and Robert (2007) and Robert and Wraith (2009). 
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2 The Pima Indian benchmark model 



In order to compare the performances of all methods presented in this survey, we chose 
to evaluate the corresponding estimates of the Bayes factor in the setting of a single 
variable selection for a probit model and to repeat the estimation in a Monte Carlo 
experiment to empirically assess the variability of those estimates. 

We recall that a probit model can be represented as a natural latent variable model 
in that, if we consider a sample z±, . . . , z n of n independent latent variables associated 
with a standard regression model, i.e. such that Zj\0 ~ Af (xj6, l), where the x^'s are 
dimensional covariates and is the vector of regression coefficients, then yx, . . . ,y n 
such that 

y% = i Zi >o 

is a probit sample. Indeed, given 0, the y^s are independent Bernoulli rv's with P(y$ = 
1\0) = $ (xj0) where $ is the standard normal cdf. 

The choice of a reference prior distribution for the probit model is open to debate, 



but the connection with the latent regression model induced Marin and Robert (2007) 
to suggest a g-prior model, ~ J\f (0 P , n(X T X)~ 1 ) , with n as the g factor and X as 
the regressor matrix. The corresponding posterior distribution is then associated with 
the density 

n 

n(0\y,X) oc J] {1 " $ $ (xfO) w x exp {-0 T (X T X)0/2n} , (1) 

i=i 

where y = (yi, . . . , y n ). In the completed model, i.e. when including the latent variables 
z = (zi, . . . , z n ) into the model, the y^s are deterministic functions of the Zj's and the 
so-called completed likelihood is 



/(y, z\0) = (2tt)-"/ 2 exp - £ ( Zi - x^) 2 /2 J] (^^o + W 



Zi>0J 



The derived conditional distributions 

Hve-I"^^' 1 ' ) if 2/1 = 1 ' (2) 
\Ar_(x^,l,0) if w = 0, ^ 

are of interest for constructing a Gibbs sampler on the completed model, where A/+ 1, 0) 

denotes the Gaussian distribution with mean xj0 and variance 1 that is left-truncated 
at 0, while M- (xj0, 1, 0) denotes the symmetrical normal distribution that is right- 
truncated at 0. The corresponding full conditional on the parameters is given by 

0|y,z~Arf^-(X T X)- 1 X T z,^-(X T X)- 1 ) . (3) 
\n + 1 n + 1 / 

Indeed, since direct simulation from the posterior distribution of is intractable, Albert 



and Chib (1993) suggest implementing a Gibbs sampler based on the above set of full 
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conditionals. More precisely, given the current value of 8, one cycle of the Gibbs 
algorithm produces a new value for z as simulated from the conditional distribution 
([2]), which, when substituted into (J3|, produces a new value for 0. Although it does not 
impact the long-term properties of the sampler, the starting value of 6 may be taken 
as the maximum likelihood estimate to avoid burning steps in the Gibbs sampler. 

Given this probit model, the dataset we consider covers a population of women who 
were at least 21 years old, of Pima Indian heritage and living near Phoenix, Arizona. 
These women were tested for diabetes according to World Health Organization (WHO) 
criteria. The data were collected by the US National Institute of Diabetes and Digestive 



and Kidney Diseases, and is available with the basic R package (R Development Core 



Team 2008). This dataset, used as a benchmark for supervised learning methods, 



contains information about 332 women with the following variables: 

- glu: plasma glucose concentration in an oral glucose tolerance test; 

- bp: diastolic blood pressure (mm Hg); 

- ped: diabetes pedigree function; 

- type: Yes or No, for diabetic according to WHO criteria. 

For this dataset, the goal is to explain the diabetes variable type by using the ex- 
planatory variables glu, bp and ped. The following table is an illustration of a classical 
(maximum likelihood) analysis of this dataset, obtained using the R glm() function with 
the probit link: 

Deviance Residuals: 

Min 1Q Median 3Q Max 

-2.1347 -0.9217 -0.6963 0.9959 2.3235 
Coefficients : 

Estimate Std. Error z value Pr(>|z|) 
glu 0.012616 0.002406 5.244 1.57e-07 *** 
bp -0.029050 0.004094 -7.096 1.28e-12 *** 
ped 0.350301 0.208806 1.678 0.0934 . 

Signif. codes: '***' 0.001 '**' 0.01 0.05 '.' 0.1 ' ' 1 
(Dispersion parameter for binomial family taken to be 1) 

Null deviance: 460.25 on 332 degrees of freedom 
Residual deviance: 386.73 on 329 degrees of freedom 
AIC: 392.73 

Number of Fisher Scoring iterations: 4 

This analysis sheds some doubt on the relevance of the covariate ped in the model 
and we can reproduce the study from a Bayesian perspective, computing the Bayes 
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factor B i opposing the probit model only based on the covariates glu and bp (model 
0) to the probit model based on the covariates glu, bp, and ped (model 1). This is 
equivalent to testing the hypothesis H : 9 3 = since the models are nested, where #3 
is the parameter of the probit model associated with covariate ped. (Note that there 
is no intercept in either model.) If we denote by X the 332 x 2 matrix containing the 
values of glu and bp for the 332 individuals and by Xi the 332 x 3 matrix containing 
the values of the covariates glu, bp, and ped, the Bayes factor £> i is given by 



(2vr) 1 / 2 n 



1/2 l/2l( X X o)| 1/2 



XfXOI-Va 

n 

- $ ((Xo)^)} 1 -^ ((X o ) v 0f exp {-6 T (X^X o )0/2n} d0 



(4) 



i=i 



„ n 

/ f[{l - $ (XO^)} 1 -^ (Xi)i,0) w exp {-0 T (X?X)0/27i} d0 
7 K 3 , =1 



^(Oa^CXTXo)" 1 ) 


n 

no 

_i=i 


-$((X o ) v 0)} 1 ^$((Xo) v 0) K 


E Ar 3 (03,n(XT Xl )-i) 


n 

j=i 


-$((X 1 ) v 0)} 1 -^$((X 1 ) v 0) K 



using the shortcut notation that Aj . is the i-th line of the matrix A. 



3 The basic Monte Carlo solution 

As already shown above, when testing for a null hypothesis (or a model) Hq : 9 e Go 
against the alternative hypothesis (or the alternative model) i?i : 9 G 61, the Bayes 
factor is defined by 



B i(y) 



f(y\0 o )M0o)d9 o 



e 



We assume in this survey that the prior distributions under both the null and the 
alternative hypotheses are proper, as, typically, they should be. (In the case of com- 
mon nuisance parameters, a common improper prior measure can be used on those, 
see Berger et al. (1998), Marin and Robert (2007). This obviously complicates the 



computational aspect, as some methods like crude Monte Carlo cannot be used at all, 
while others are more prone to suffer from infinite variance.) In that setting, the most 
elementary approximation to Boi(y) consists in using a ratio of two standard Monte 
Carlo approximations based on simulations from the corresponding priors. Indeed, for 
i = 0,1: 

' f(y\9)n t (9)d9 = E Wi [f(y\9)} . 
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Figure 1: Pima Indian dataset: boxplot of 100 Monte Carlo estimates of B 01 (y) based 
on simulations from the prior distributions, for 2 x 10 4 simulations. 

If #0,1; ■ ■ j 9o,n and 0^1, . . . , 9x )Tll are two independent samples generated from the prior 
distributions tt and ttl, respectively, then 

is a strongly consistent estimator of B x(y). 

In most cases, sampling from the prior distribution corresponding to either hypoth- 
esis is straightforward and fast. Therefore, the above estimator is extremely easy to 
derive as a brute-force evaluation of the Bayes factor. However, if any of the poste- 
rior distributions is quite different from the corresponding prior distribution — and it 
should be for vague priors — , the Monte Carlo evaluation of the corresponding evidence 
is highly inefficient since the sample will be overwhelmingly producing negligible val- 
ues of f(y\9ij). In addition, if f 2 (y\0) is not integrable against tt or n%, the resulting 
estimation has an infinite variance. Since importance sampling usually requires an 
equivalent computation effort, with a potentially highy efficiency reward, crude Monte 
Carlo approaches of this type are usually disregarded. 

Figure [T] and Table [T] summarize the results based on 100 replications of Monte Carlo 
approximations of B 01 (y), using equation ^ with n = n\ = 20,000 simulations. As 
predicted, the variability of the estimator is very high, when compared with the other 
estimates studied in this survey. (Obviously, the method is asymptotically unbiased 
and, the functions being square integrable in Q, with a finite variance. A massive 
simulation effort would obviously lead to a precise estimate of the Bayes factor.) 
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4 Usual importance sampling approximations 



Defining two importance distributions with densities w Q and wi, with the same supports 
as ttq and 7Ti, respectively, we have: 

B 01 (y)=E mo [f{y\9)iz Q {9)/w Q {9)]U^ [f ^9)^(9) /w 1 (9)} . 

Therefore, given two independent samples generated from distributions wq and w\, 
6*0,1, • • • , ^o,n an d $i,i> • • • 5 ^l.ni) respectively, the corresponding importance sampling 
estimate of B i(y) is 

1 Egi fo(z\0oj)M e oj)/™o(Oo,j) ^ 

Compared with the standard Monte Carlo approximation above, this approach offers 
the advantage of opening the choice of the representation ^ in that it is possible 
to pick importance distributions Wq and W\ that lead to a significant reduction in 
the variance of the importance sampling estimate. This implies choosing importance 
functions that provide as good as possible approximations to the corresponing posterior 
distributions. Maximum likelihood asymptotic distributions or kernel approximations 
based on a sample generated from the posterior are natural candidates in this setting, 
even though the approximation grows harder as the dimension increases. 

For the Pima Indian benchmark, we propose for instance to use as importance 
distributions, Gaussian distributions with means equal to the maximum likelihood (ML) 
estimates and covariance matrices equal to the estimated covariance matrices of the ML 
estimates, both of which are provided by the R glm() function. While, in general, those 
Gaussian distributions provide crude approximations to the posterior distributions, the 
specific case of the probit model will show this is an exceptionally good approximation 
to the posterior, since this leads to the best solution among all those compared here. 
The results, obtained over 100 replications of the methodology with no = n\ = 20, 000 
are summarized in Figure [2] and Table [TJ They are clearly excellent, while requiring the 
same computing time as the original simulation from the prior. 



5 Bridge sampling methodology 



The original version of the bridge sampling approximation to the Bayes factor (Gelman 



and Meng 1998, Chen et al. 2000) relies on the assumption that the parameters of 
both models under comparison belong to the same space: O = ©i- In that case, 
for likelihood functions f and fi under respectively models 3Jt and DJti, the bridge 
representation of the Bayes factor is 



B i(y) 



f (y\9)7T (9)d9 



f 1 (y\6)Tr 1 (6)d8 = E 



7T1 



01 



Mv\e)M0) 



(7) 
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IVIorite Carlo 



Importance sampling 



Figure 2: Pima Indian dataset: boxplots of 100 Monte Carlo and importance sampling 
estimates of B i(y), based on simulations from the prior distributions, for 2 x 10 4 
simulations. 

Given a sample from the posterior distribution of 9 under model 9JTi, 6>x i, . . . , 6 1>N ~ 
7Ti(6\y), a first bridge sampling approximation to Bqi(d) is 

AT-l V /0(l/|^l,j) 7r 0(^l,i) 

From a practical perspective, for the above bridge sampling approximation to be of any 
use, the constraint on the common parameter space for both models goes further in 
that, not only must both models have the same complexity, but they must also be pa- 
rameterised on a common ground, i.e. in terms of some specific moments of the sampling 
model, so that parameters under both models have a common meaning. Otherwise, the 
resulting bridge sampling estimator will have very poor convergence properties, possibly 
with infinite variance. 

Equatio n (|7|) i s nothing but a very special case of the general representation (Torrie 



and Valleau 1977) 



B 01 (y) = E v [f (y\e)n o (0)/<p{e)] / \fi{y\9)*x{&)/ <p{B)\ , 

which holds for any density ip with a sufficiently large support and which only requires a 
single sample 8i, . . . , On generated from (p to produce an importance sampling estimate 
of the ratio of the marginal likelihoods. Apart from using the same importance function 
(p for both integrals, this method is therefore a special case of importance sampling. 
Another extension of this bridge sampling approach is based on the general repre- 



sentation 



BoM = //o(y|0ko(%(0)7ri(%)d0/ fi(y\e)iri(6)a(e)ir o (e\y)d0 



Tli 

j=i 

no 



3=1 



3 1 



where 9 0i i, . . . , #o,n an d #1,1, • • • , 0i,m are two independent samples coming from the 
posterior distributions n (9\y) and 7Ti(d\y), respectively. That applies for any positive 
function a as long as the upper integral exists. Some choices of a lead to very poor per- 
formances of the method in connection with the harmonic mean approach (see Section 
[6]), but there exists a quasi-optimal solution, as provided by Gelman and Meng (1998): 



a*(y) oc 



n 7r (%) +ni7Ti(%) ' 
This optimum cannot be used per se, since it requires the normalising constants of 



both ir (9\y) and 7Ti(9\y). As suggested by Gelman and Meng (1998), an approximate 



version uses iterative versions of a*, based on iterated approximations to the Bayes 
factor. Note that this solution recycles simulations from both posteriors, which is 
quite appropriate since one model is selected via the Bayes factor, instead of using an 
importance weighted sample common to both approximations. We will see below an 
alternative representation of the bridge factor that bypasses this difficulty (if difficulty 
there is!). 

Those derivations are, however, restricted to the case where both models have the 
same complexity and thus they do not apply to embedded models, when Bo C 0i in 
such a way that Q\ = (9, ip), i.e., when the submodel corresponds to a specific value ipo 

ofV: /o(#) = /(l/|Mo). ' 

The extension of the most advanced bridge sampling strategies to such cases requires 
the introduction of a pseudo-posterior density, uj(ip\9,y), on the parameter that does 
not appear in the embedded model, in order to reconstitute the equivalence between 
both parameter spaces. Indeed, if we augment n (9\y) with uj(ip\9, y), we obtain a joint 
distribution with density no(9\y) x u(ip\9,y) on 6i. The Bayes factor can then be 
expressed as 



f{y\9, i; )7r (0)a(9, ^(9, ^\y)d9u(^\9, y) d^ 



6i 



B i(y) = 

I /(2/|Mki(M)a(M)*b(%) x u^\9,y)d9dip 

</6i 

for all functions a(9, because it is clearly independent from the choice of both a(9, ip) 
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and u(i/)\9,y). Obviously, the performances of the approximation 



in 



( n 1 f(y\ dl >ii ' l Po)7ro(9i >j )uj(ipx tj \e l!j ,y)a(9 l!j ,ipij) 

3=1 



i=i 

where (6 0>1 , ^o,i), • • • , (Oa,n ,i>o,n ) and (0i,i, ^i.i), . . . , (0i, ni , V'l.nJ are two independent 
samples generated from distributions 7r (6%) X o;(^|0, y) and 7r 1 (6 ) , V>|y)> respectively, do 
depend on this completion by the pseudo-posterior as well as on the function a(9,ip). 
Chen et al. (2000) establish that the asymptotically optimal choice for uj(ip\Q,y) is the 
obvious one, namely 

cu^\9,y) = n 1 ^\9,y), 

which most often is unavailable in closed form (especially when considering that the 
normalising constant of tu(ip\9, y) is required in (18])). However, in latent variable models, 
approximations of the conditional posteriors often are available, as detailed in Section 

m 

While this extension of the basic bridge sampling approximation is paramount for 
handling embedded models, its implementation suffers from the dependence on this 
pseudo-posterior. In addition, this technical device brings the extended bridge method- 



ology close to the cross-model alternatives of Carlin and Chib (1995) and Green (1995), 



in that both those approaches rely on completing distributions, either locally 



Green 



1995) or globally ( Carlin and Chib| 1995), to link both models under comparison in 



a bijective relation. The density u(i/)\9o,y) is then a pseudo-posterior distribution in 
Chib and Carlin's (1995) sense, and it can be used as Green's (1995) proposal in the 
reversible jump MCMC step to move (or not) from model 9Jl to model DJli. While 
using cross-model solutions to compare only two models does seem superfluous, given 
that the randomness in picking the model at each step of the simulation is not as useful 
as in the setting of comparing a large number or an infinity of models, the average 
acceptance probability for moving from model 97t to model SDTi is related to the Bayes 
factor since 

/(y|0,VHi(M) 



E 



7TQ 



f{y\eMMOMW,y) 



B i(y) 



even though the average 



E 



7TQ XLU 



mm 



f(y\9 ,^(9,^) 

f(y\9MM0Mmy) 



does not provide a closed form solution. 

For the Pima Indian benchmark, we use as pseudo-posterior density u(9 3 \9i,92,y), 
the conditional Gaussian density deduced from the asymptotic Gaussian distribution 
on (9\, 92, #3) already used in the importance sampling solution, with mean equal to the 
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Figure 3: Pima Indian dataset: (left) boxplots of 100 importance sampling, bridge 
sampling and Monte Carlo estimates of B m (y), based on simulations from the prior 
distributions, for 2 x 10 4 simulations; (right) same comparison for the importance sam- 
pling versus bridge sampling estimates only. 

ML estimate of (9\, 82, #3) and with covariance matrix equal to the estimated covariance 
matrix of the ML estimate. The quasi-optimal solution a* in the bridge sampling 
estimate is replaced with the inverse of an average between the asymptotic Gaussian 
distribution in model 97ti and the product of the asymptotic Gaussian distribution in 
model 9Jto times the above uj(9 3 \9i,92,y)- This obviously is a suboptimal choice, but 
it offers the advantage of providing a non-iterative solution. The results, obtained over 
100 replications of the methodology with n = n\ = 20, 000 are summarized in Figure [3] 
and Table [TJ The left-hand graph shows that this choice of bridge sampling estimator 
produces a solution whose variation is quite close to the (excellent) importance sampling 
solution, a considerable improvement upon the initial Monte Carlo estimator. However, 
the right-hand-side graph shows that the importance sampling solution remains far 
superior, especially when accounting for the computing time. (In this example, running 
20,000 iterations of the Gibbs sampler for the models with both two and three variables 
takes approximately 32 seconds.) 



6 Harmonic mean approximations 



While using the generic harmonic mean approximation to the marginal likelihood is 
l fra 
0,1 



often fraught with danger (Neal 1994) 



the representation (Gelfand and Dey 1994) 



E 



<Pk(0) 



*k(e)fk(y\e) 



y 



M0) *k(0)fk(y\e) 



Kk(0)fk(y\0) m k (y) 



d9 



m k (y) 



(9) 



holds, no matter what the density fk{8) is — provided <fk(9) = when nk(9) fk(y\9) = 
— . This representation is remarkable in that it allows for a direct processing of Monte 
Carlo or MCMC output from the posterior distribution 7Tk(9\y). As with importance 
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sampling approximations, the variability of the corresponding estimator of Bqi (y) will 
be small if the distributions <pk{@) (k = 0,1) are close to the corresponding poste- 
rior distributions. However, as opposed to usual importance sampling constraints, the 
density (fik(9) must have lighter — rather than fatter — tails than ftk{9)fk{y\9) for the 
approximation of the marginal rrik(y) 



N 



jr( n k (0k,j)fk(y\0k,j) 

to enjoy finite variance. For instance, using tpk(9) = ^k{9) as in the original harmonic 



mean approximation (Newton and Raftery 1994) will most usually result in an infi- 



nite variance estimator, as discussed by Neal (1994). On the opposite, using (pkS with 



constrained supports derived from a Monte Carlo sample, like the convex hull of the 
simulations corresponding to the 10% or to the 25% HPD regions — that again is eas- 
ily derived from the simulations — is both completely appropriate and implementable 



(Robert and Wraith 2009). 



However, for the Pima Indian benchmark, we propose to use instead as our dis- 
tributions ipk{9) the very same distributions as those used in the above importance 
sampling approximations, that is, Gaussian distributions with means equal to the ML 
estimates and covariance matrices equal to the estimated covariance matrices of the 
ML estimates. The results, obtained over 100 replications of the methodology with 
N = 20,000 simulations for each approximation of m^{y) (k = 0,1) are summarized 
in Figure [4] and Table [TJ They show a very clear proximity between both importance 
solutions in this special case and a corresponding domination of the bridge sampling 
estimator, even though the importance sampling estimate is much faster to compute. 
This remark must be toned down by considering that the computing time due to the 
Gibbs sampler should not necessarily be taken into account into the comparison, since 
samples are generated under both models. 



7 Exploiting functional equalities 

Chib's (1995) method for approximating a marginal (likelihood) is a direct application 
of Bayes' theorem: given y ~ fk(y\9) and 9 ~ 7Tfc(0), we have that 

_ f k (y\9)n k (6) 
mk ~ 7T k (9\y) ' 

for all #'s (since both the lhs and the rhs of this equation are constant in 9). Therefore, 
if an arbitrary value of 9, say 91, is selected and if a good approximation to itk(9\y) can 



be constructed, denoted ft(6\y), Chib's (1995) approximation to the evidence is 



Kk(9* k \y) 
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Importance sampling 



Figure 4: Pima Indian dataset: (left) boxplots of 100 bridge sampling, harmonic mean 
and importance sampling estimates of B i(y), based on simulations from the prior 
distributions, for 2 x 10 4 simulations; (right) same comparison for the harmonic mean 
versus importance sampling estimates only. 



In a general setting, it{9\y) may be the Gaussian approximation based on the MLE, 
already used in the importance sampling, bridge sampling and harmonic mean solu- 
tions, but this is unlikely to be accurate in a general framework. A second solution 
is to use a nonparametric approximation based on a preliminary MCMC sample, even 
though the accuracy may also suffer in large dimensions. In the special setting of latent 
variables models (like mixtures of distributions but also like probit models), Chib's 
(1995) approximation is particularly attractive as there exists a natural approximation 



to 7Tfc(6%), based on the Rao-Blackwell (Gelfand and Smith 1990) estimate 



1 T 

*=i 

where the s are the latent variables simulated by the MCMC sampler. The esti- 
mate ifk(0t\y) * s a parametric unbiased approximation of ^k{0* k \y) that converges with 
rate Q(\/T). This Rao-Blackwell approximation obviously requires the full conditional 
density n k (9l\y, z) to be available in closed form (constant included) but, as already 
explained, this is the case for the probit model. 

Figure [5] and Table [T] summarize the results obtained for 100 replications of Chib's 
approximations of -Boi(l/) with T = 20, 000 simulations for each approximation of rrik{y) 
(k = 0,1). While Chib's method is usually very reliable and dominates importance 
sampling, the incredibly good approximation provided by the asymptotic Gaussian 
distribution implies that, in this particular case, Chib's method is dominated by both 
the importance sampling and the harmonic mean estimates. 
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Figure 5: Pima Indian dataset: boxplots of 100 Chib's, harmonic mean and importance 
estimates of B i(y), based on simulations from the prior distributions, for 2 x 10 4 
simulations. 



Table 1: Pima Indian dataset: Performances of the various approximation methods 
used in this survey. 





Monte 


Importance 


Bridge 


Harmonic 


Chib's 




Carlo 


sampling 


sampling 


mean 


approximation 


Median 


3.277 


3.108 


3.087 


3.107 


3.104 


Standard deviation 


0.7987 


0.0017 


0.1357 


0.0025 


0.0195 


Duration in seconds 


7 


7 


71 


70 


64 



8 Conclusion 

In this short evaluation of the most common estimations to the Bayes factor, we have 
found that a particular importance sampling and its symmetric harmonic mean coun- 
terpart are both very efficient in the case of the probit model. The bridge sampling 
estimate is much less efficient in this example, due to the approximation error result- 
ing from the pseudo-posterior. In most settings, the bridge sampling is actually doing 
better than the equivalent importance sampler ( Robert and Wraith||2009 ) , while Chib's 



method is much more generic than the four alternatives. The recommendation result- 
ing from the short experiment above is therefore to look for handy approximations to 
the posterior distribution, whenever available, but to fall back on Chib's method as a 
backup solution providing a reference or better. 
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